84 research outputs found
Projection error evaluation for large multidimensional data sets
This research deals with projection error evaluation for large data sets using only a personal computer without any particular technologies for high performance computing. A shortcoming of basic projection error calculation ways is such that they require a large amount of computer memory or computation time is not acceptable when large data sets are analyzed. This paper proposes two ways for projection error evaluation: the first one is based on calculating the projection error for not full data set, but only for representative data sample, the second one obtains the projection error by dividing a data set into the smaller data sets. The experiments have been carried out with twelve real and artificial data sets. The computational efficiency of the projection error evaluation ways is confirmed by a comprehensive set of comparisons. We demonstrate that dividing data set into the smaller data sets allows us to calculate the projection error for large data sets
A new dimensionality reduction-based visualization approach for massive data
We live in a big data and data analytics era. The volume, velocity, and variety of data generated today require
special methods and techniques for data analysis and inferencing. Data visualization tools allow us to understand
the data deeper. One of the straightforward ways of multidimensional data visualization is based on
dimensionality reduction and illustrated by a scatter plot. However, visualization of millions of points in a scatter
plot does not make a sense. Usually, data sampling or clustering is performed before visualization to reduce the
amount of the visualized points, but in such a case, meaningful outliers can be rejected and will not be
visualized. In this paper, a new approach for massive data visualization without point overlapping is proposed
and investigated. The approach consists of two main stages: selection of a data subset and its visualization
without overlapping. The experiments have been carried out with ten data sets. The efficiency of subset selection
and visualization of data subset projection is confirmed by a comprehensive set of comparisons
Visual analysis of self-organizing maps
In the article, an additional visualization of self-organizing maps (SOM) has been investigated. The main objective of self-organizing maps is data clustering and their graphical presentation. Opportunities of SOM visualization in four systems (NeNet, SOM-Toolbox, Databionic ESOM and Viscovery SOMine) have been investigated. Each system has its additional tools for visualizing SOM. A comparative analysis has been made for two data sets: Fisher’s iris data set and the economic indices of the European Union countries. A new SOM system is also introduced and researched. The system has a specific visualization tool. It is missing in other SOM systems. It helps to see the proportion of neurons, corresponding to the data items, belonging to the different classes, and fallen in the same SOM cell
Programinės sistemos duomenų tyrybos mokymui
Data mining systems suitable for data mining teaching have been investigated in the paper. Ussualy, such systems as SPSS Modeler (Clementine), Statistica, SAS/STAT are used in the mathematical statistics courses. However, they are not always suitable for data mining teaching. WEKA, Orange, KNIME, RapidMiner systems are more appropriate for this purpose.Šiame straipsnyje apžvelgiamos programinės sistemos, kurios gali būti naudojamos duomenų tyrybos dalykui mokyti(-s). Įprastai matematinės statistikos mokyme naudojamos tokios sistemos, kaip SPSS Modeler (Clementine), Statistica, SAS/STAT. Jos turi daug funkcijų, yra daugiau orientuotos į matematinės statistikos sprendžiamus uždavinius, todėl duomenų tyrybos mokymui ne visada tinka. Duomenų tyrybos mokymui tinkamesnės sistemos: WEKA, Orange, KNIME, RapidMiner
Akies dugno nuotraukų semantinis segmentavimas naudojant konvoliucinius neuroninius tinklus
This article reviews the problems of eye bottom fundus analysis and semantic segmentation algorithms used to distinguish the eye vessels and the optical disk. Various diseases, such as glaucoma, hypertension, diabetic retinopathy, macular degeneration, etc., can be diagnosed through changes and anomalies of the vesssels and optical disk. Convolutional neural networks, especially the U-Net architecture, are well-suited for semantic segmentation. A number of U-Net modifications have been recently developed that deliver excellent performance results.Straipsnyje apžvelgiama akies dugno nuotraukų analizės problematika, semantinio segmentavimo algoritmai, taikomi išskirti akies kraujagysles ir optinį diską. Aptikus jų pokyčius ir anomalijas, galima diagnozuoti įvairias ligas, tokias kaip glaukomą, hipertenziją, diabetinę retinopatiją, makulos degeneraciją ir t. t. Semantiniam segmentavimui atlikti puikiai tinka konvoliuciniai neuroniniai tinklai, ypač U-Net architektūros. Pastaruoju metu buvo sukurta nemažai U-Net modifikacijų, kurios pasiekia puikius efektyvumo rezultatus
Saviorganizuojančių neuroninių tinklų sistemų lyginamoji analizė
Straipsnyje nagrinėjamos ir lyginamos tarpusavyje trys saviorganizuojančių neuroninių tinklų (SOM) sistemos: NeNet, SOM-Toolbox ir Databionic ESOM. Pagrindinis šių sistemų tikslas yra suskirstyti duomenis į klasterius pagal jų panašumą, pateikti juos SOM žemėlapyje. Sistemos viena nuo kitos skiriasi duomenų pateikimu, mokymo taisyklėmis, vizualizavimo galimybėmis, todėl čia aptariami sistemų panašumai ir skirtumai. SOM žemėlapiams mokyti ir vizualizuoti naudojami irisų ir stikloduomenys.Comparative Analysis of Self-Organizing Map SystemsPavel Stefanovič, Olga Kurasova
SummaryIn the article, we compare three systems of self-organizing maps: NeNet, SOM-Toolbox and Databionic ESOM. The main target of the usage of the systems is data clustering and their graphical presentation on the self-organizing map (SOM). The self-organizing maps are one of types of artifi cial neural networks. The SOM systems are different one from other in their interfaces, the data pre-processing, learning rules, visualization manners, etc. Similarities and differences of the systems have been highlighted here. The experiments have been carried out with two data sets: iris and glass. Quantization and topographic errors of SOMs have been estimated, too.an
Vektorių kvantavimo metodų ir daugiamačių skalių junginys daugiamačiams duomenims vizualizuoti
Darbe pateikiama lyginamoji dviejų vektorių kvantavimo metodų (saviorganizuojančių neuroninių tinklų ir neuroninių dujų) analizė. Neuronai nugalėtojai, kurie gaunami vektorių kvantavimo metodais, yra vizualizuojami daugiamačių skalių metodu. Tirta kvantavimo paklaidos priklausomybė nuo vektorių nugalėtojų skaičiaus. Išsiaiškinta, kuris vektorių kvantavimo metodas yra tinkamesnis jungti su daugiamačių skalių metodu, t. y. vizualizavus neuronus nugalėtojus „atskleidžiama“ analizuojamųduomenų struktūra.Combination of Vector Quantization and Multidimensional ScalingAlma Molytė, Olga Kurasova
SummaryIn this paper, we present a comparative analysis of a combination of two vector quantization methods (self-organizing map (SOM) and neural gas (NG)), based on neural networks and multidimensional scaling that is used for visualization of codebook vectors obtained by vector quantization methods. The dependence of neuron-winners, quantization and mapping qualities, and preserving of a data structure in the mapping image are investigated. It is established that the quantization errors of NG are smaller than that of the SOM when the number of neurons-winners is approximately equal. It means that the neural gas is more suitable for vector quantization. The data structure is visible in the mapping image even when the number r of neurons-winners of NG is small enough. If the number r of neurons-winners of the SOM is larger, the data structure is visible, as well.8px;"> 
Neuronų skaičiaus parinkimas vektorių kvantavimo metoduose
In this paper, a strategy of the selection of the neurons number for vector quantization methods has been investigated. Two methods based on neural networks have been analysed: self-organizing map and neuralgas. There is suggested a way under which the number of neurons is selected taken into account the particularity of the analysed data set.Darbe nagrinėjama neuronų skaičiaus parinkimo vektorių kvantavimo metoduose strategija. Analizuojami du neuroniniais tinklais pagrįsti metodai: saviorganizuojantys neuroniniai tinklai ir neuroninės dujos. Pasiūlytas būdas, pagal kurį parenkamas neuronų skaičius atsižvelgiant į analizuojamų duomenų specifiką.  
Konferencijos „Lietuvos magistrantų informatikos ir IT tyrimai“ darbai
The conference "Lithuanian MSc Research in Informatics and ICT" is a venue to present research of Lithuanian MSc theses in informatics and ICT. The aim of the event is to raise skills of MSc and other students, familiarize themselves with the research of other students, encourage their interest in scientific activities. Students from Kaunas University of Technology and Vilnius University will give their presentations at the conference
Duomenų tyrybos sistemų galimybių tyrimas įvairių apimčių duomenims analizuoti
Tobulėjant šiuolaikinėms informacinėms ir komunikacinėms technologijoms, sparčiai didėja apdorojamų ir saugomų duomenų kiekiai, todėl duomenų analizės uždavinys tampa vis sudėtingesnis, sunku daryti greitus, efektyvius ir teisingus sprendimus. Duomenų analizei dažnai pasitelkiama duomenų tyryba. Duomenų tyryba – tai procesas, kurio metu iš duomenų išgaunamos naudingos žinios. Duomenims apdoroti bei žinioms išgauti reikalingos duomenų tyrybos sistemos, leidžiančios apdoroti įvairios apimties duomenis. Tyrime siekiama nustatyti, kokios apimties duomenis per priimtiną laiką sugeba apdoroti populiariausios duomenų tyrybos sistemos. Nagrinėjamas ir lyginamas trijose atvirojo kodo duomenų tyrybos sistemose (WEKA, KNIME, ORANGE) įgyvendintų klasifi kavimo ir klasterizavimo algoritmų skaičiavimo laikas, analizuojant skirtingos apimties duomenų aibes. Vertinant sistemas svarbus ne tik algoritmų skaičiavimo laikas, bet ir klasifi kavimo bei klasterizavimo tikslumas, kurį pavyksta pasiekti per tą laiką, todėl straipsnyje pateikiamos ir eksperimentiniuose tyrimuose gauto tikslumo matų reikšmės.Investigation of the abilities of data mining systems to analyse various volume datasets Kotryna Paulauskienė, Olga Kurasova
SummaryThe aim of the paper is to determine what volume of data the popular data mining systems are able to analyse within a reasonable period of time, when solving classifi cation and clustering problems. Three open source data mining systems are investigated: WEKA, KNIME, and ORANGE. The experiments have been carried out with eight datasets, where the number of attributes was fi xed – 100 and the number of instances ranged between 5000 and 600 000. The experimental investigation has shown that when the ORANGE system is used, the data of more than 50 000 instances are of too large volume. In order to analyse larger datasets, the WEKA and KNIME systems need to be used. The data of more than 200 000 instances are of too large volume for WEKA and KNIME, however, when simple classifi cation methods are used, both systems are able to handle 400 000 instances, and KNIME – 600 000 instances. The results have showed that KNIME can handle larger datasets than WEKA, when applying some classifi cation methods. The accuracy of classifi cation is high enough, when the classifi cation methods, implemented in the systems, are used.%; font-family: Calibri, sans-serif;"> 
- …